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The announcement of the discovery of a Higgs boson-like particle at CERN will be remembered as 
one of the milestones of the scientific endeavor of the 21 st century. In this paper we present a study 
of information spreading processes on Twitter before, during and after the announcement of the 
discovery of a new particle with the features of the elusive Higgs boson on 4 th July 2012. We report 
evidence for non-trivial spatio-temporal patterns in user activities at individual and global level, 
such as tweeting, re-tweeting and replying to existing tweets. We provide a possible explanation for 
the observed time- varying dynamics of user activities during the spreading of this scientific "gossip" . 
We model the information spreading in the corresponding network of individuals who posted a tweet 
related to the Higgs boson discovery. Finally, we show that we are able to reproduce the global 
behavior of about 500,000 individuals with remarkable accuracy. 



The Higgs boson, whose existence has been hypothe- 
sized in 1964 pQ, has gained the title of the most elusive 
particle in modern science. The search for its existence 
has been among the top research priorities of the parti- 
cle physics community for nearly 50 years. 2012 will be 
probably remembered as one of the most important years 
in this century for physics: on 4 th July 2012 the ATLAS 
and CMS collaborations, two international experiments 
involved in the search for the Higgs boson, announced the 
results of the discovery of a new particle with the features 
of the elusive Higgs boson, the missing component of the 
Standard Model. 

The elusive nature of the Higgs boson required the de- 
velopment of a new generation of large-scale experimen- 
tal facilities, resulting in the construction of the Large 
Hadron Collider (LHC) at CERN, in Geneve (Switzer- 
land), the largest and most powerful particle acceler- 
ator ever built. The other detector able to find hints 
about the existence of the Higgs boson is the Tevatron 
at Batavia, IL (USA). The association of the Higgs bo- 
son to the idea of the final understanding of our Universe 
and the possibility of the Grand Unified Theory [5H5] is 
likely to be responsible for the huge popularity of this 
research project in both academic and non-academic cir- 
cles. Indeed, the interest from both specialized and pop- 
ular media increased after the "God particle" nickname 
was assigned to the Higgs boson [BJ. 

The announcement of this discovery was the first of 
this kind in the era of global online social media, such as 
Twitter: the entire world followed and discussed the news 
and updates through them, commenting and providing 
personal views about the event. All this information is 
publicly available online and represents an extremely in- 
teresting source of data for analyzing the global dynamics 
of this scientific gossip around the world. 

On 2 th July, initial results were presented by the Teva- 
tron team, but they were not sufficient to claim a sci- 
entific discovery. The statistical significance of all the 
combined analyses was 2.9 sigma, equivalent to a 1-in- 
550 chance that the signal was due to a statistical fluc- 
tuation [7 . Although of remarkable importance for the 
scientific community, such an announcement had a weak 



impact on the general public. Following this, there was a 
strong expectation, accompanied by rumors, for the cor- 
responding results from the CERN teams. An unofficial 
video was even leaked during those days [5] . The spread- 
ing of these rumors about a possible discovery attracted 
the interest of media, also outside the academic com- 
munity, until the official day of the announcement on 4 th 
July during the International Conference on High-Energy 
Physics 2012 in Melbourne, Australia. 

We can summarize the events before and after the dis- 
covery of the boson, dividing them into 4 different peri- 
ods: 

• Period I: Before the announcement on 2 nd July, 
there were some rumors about the discovery of a 
Higgs-like boson at Tevatron; 

• Period II: On 2 nd July at 1 PM GMT, scientists 
from CDF and DO experiments, based at Tevatron, 
presented results indicating that the Higgs particle 
should have a mass between 115 and 135 GeV/c 2 
(corresponding to about 123-144 times the mass of 
the proton) [7]; 

• Period III: After 2 nd July and before 4 th of July 
there were many rumors about the Higgs boson dis- 
covery at LHC [5]; 

• Period IV: The main event was the announce- 
ment on 4 th July at 8 AM GMT by the scientists 
from the ATLAS and CMS experiments, based at 
CERN, presenting results indicating the existence 
of a new particle, compatible with the Higgs bo- 
son, with mass around 125 GeV/c 2 [HI [TO]- After 
4 th July, popular media covered the event. 

In this paper, we present the anatomy of the spread- 
ing of this scientific gossip by following and analyzing 
the related Twitter user activity during and after the 
announcement. More specifically, we consider the mes- 
sages posted in Twitter about this discovery between 1 st 
and 7 th July 2012. We report evidence for non-trivial 
spatio-temporal patterns in user activities at individual 
and global level, as tweeting, re-tweeting or replying to 
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existing tweets. Well-defined trends can be associated to 
different periods of time. Abrupt changes can be linked 
to the key events around the announcement. We ana- 
lyze the activity patterns of the individuals that tweeted 
about this discovery over the period taken into consider- 
ation. We propose a model for the information spread- 
ing over the Twitter network. Finally, we show that we 
are able to reproduce the global behavior of more than 
500,000 individuals with remarkable accuracy. 



Results 
Overview of the Dataset 

Our dataset consists of messages posted on the Twit- 
ter social network, crawled by means of the Applica- 
tion Programming Interface (API) made available by the 
service itself. We collected tweets sent between 00:00 
AM, 1 st July 2012 and 11:59 PM, 7 th July 2012 con- 
taining at least one of the following keywords or hash- 
tags: lhc, cern, boson, higgs (see Methods). The final 
amount of tweets we analyzed was 985, 590. Hence, we 
built the corresponding social network of the authors of 
the tweets: the resulting graph is composed of 456,631 
nodes and 14,855,875 directed edges. Nodes correspond 
to the authors of the tweets and edges represent the fol- 
lowee/ follower relationships between them. We discarded 
70,838 users from the original dataset containing 527,469 
users because of the non accessibility of the list of their 
followees and followers due to privacy settings. Twit- 
ter users can specify their location by filling the Location 
field of their profile, on optional basis and at different lev- 
els of granularity (e.g., United States, New York, Chelsea, 
etc.). We use this information, when available, to assign 
a geographic position to each tweet: the resulting number 
of geo-located tweets is 632,027 (see Methods). 

In Fig. [I] we show the distributions of the in-degree, 
out-degree and total degree of the users that tweeted 
about the Higgs boson. Intriguingly, the underlying 
topology is not trivial. The out-degree distribution shows 
a power-law scaling with two different regimes: P(k ou t) oc 
k~^ t A and P(fcout) °c k~ui 9 , with crossover for fc out « 200, 
which indicates that very few users follow more than a 
few hundred users. Conversely, the in-degree distribution 
shows a different behavior: for in-degree smaller than 
ki n ~ 100 the scaling relation is not satisfied, whereas 
above this threshold the network exhibits a power-law 
scaling P(k- m ) oc k^ 22 . A standard method to uncover 
the presence of correlations in the network is to investi- 
gate the assortative mixing of its nodes [TTJEi]. I n fact, 
the nodes in the network with a large number of links may 
tend to be connected to other nodes with many connec- 
tions (assortative mixing with positive assortative index) 
or to other nodes with a few connections (disassortative 
mixing with negative assortative index). In both cases, 
the network shows degree correlations resulting in an as- 
sortative index different from zero, at variance with an 




10° 10 1 10 2 10 3 10 4 10 5 

Degree, k 



Figure 1: Probability density of in-degree, out-degree and 
total degree of the nodes that tweeted about the Higgs boson. 
The corresponding distributions have been shifted along the 
y axis to put in evidence their structure. Dashed lines are 
shown for guidance only. 
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Figure 2: Number of tweets per second as a function of time 
during the period of data collection. The curves correspond 
to tweets containing only the CERN, Higgs, LHC keywords and 
at least one of them, respectively. 

uncorrelated network where this index is close to zero. 
In our case study we find a value of about -0.14, indi- 
cating the presence of correlations in the network, with 
disassortative mixing of users. 



Spatio-temporal Analysis 

In this section, we investigate both spatial and tem- 
poral features of the observed data, i.e., user activity on 
Twitter before, during and after the main event on 4 th 
July 2012. More specifically, we focus our attention on 
the study of user behavior by considering two different 
analyses: the first one is performed at a global (macro- 
scopic) level, while the second one is performed at an 
individual (microscopic) level. 

Macroscopic Level. We consider the entire set of in- 
dividuals as a large-scale complex system of interacting 
entities, and we analyze the dynamics at a macroscopic 
level of such a system by inspecting spatio-temporal pat- 
terns of consecutive tweets. The inter-tweets time (space) 
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is defined by the temporal delay (spatial distance) be- 
tween two consecutive tweets posted by any user in the 
network. 

In Fig. [2] we show the evolution of the rate of tweets 
containing the CERN, Higgs, and LHC keywords. The rate 
shows a rapidly increasing trend up to the day of the 
announcement of the CERN teams, after which it slowly 
decreases. It is worth noting that, when all the keywords 
are considered, the rate of tweets increases from approx- 
imately 36 tweets/hour at the beginning of Period I up 
to about 36,000 tweets/hour at the beginning of Period 
IV. The rumors anticipating the presentation of results at 
Tevatron caused the initial spreading of tweets about the 
Higgs boson. This was further sustained by the subse- 
quent comments to these initial postings and the rumors 
about the results to be presented by the scientists belong- 
ing to the ATLAS and CMS experiments. During a few 
hours after the announcement of the discovery, the rate 
increased by more than one order of magnitude, while it 
slowly decreased in the following days. 

In the top panels of Fig.[3]we show the density of tweets 
before (left panel) , during (middle panel) and after (right 
panel) the main event, on 4 th July 2012. In the bottom 
panels in the same figure we show the corresponding net- 
works of users built from re- tweets. 

The impact of the announcement on 4 th was truly 
global. Instead, before and after this main event the 
countries with a significant number of tweets were Euro- 
pean, probably due to the fact that CERN is in Switzer- 
land and the largest number of scientists working there 
are from Europe. A large number of tweets were also ob- 
served from the United States, which hosts a very large 
community of scientists. 

Our first goal is to gain insights into the spatial and 
temporal patterns of this complex geographic social net- 
work: in order to do so, we estimate the distance in time 
and space of tweets posted in the network. In Fig. 4(a 



we show the number of tweets as a function of the inter- 
tweets time (left panel) and the inter-tweets space (right 
panel) between two consecutive messages. In both cases, 
we show the distributions corresponding to the period 
before, during and after the main event on 4 th July, re- 
spectively While the distribution of inter-tweets spaces 
is the same regardless of the time window taken into con- 
sideration, the distribution of inter-tweets times in the 
three windows is very different. From a global point of 
view, this Twitter activity exhibits long tails before and 
after the main event, with a large number of tweets sent 
within a few seconds, and a small number sent within a 
few minutes. On the other hand, the dynamics of the pro- 
cess changes dramatically during the main event, when 
the inter-tweets time between consecutive tweets is likely 
to be less than two seconds and no more than six seconds, 
indicating a frenetic user activity. A deeper investigation 
at an individual level of this bursty behavior is presented 
in the next section. 

In order to unveil the presence of spatio-temporal 
patterns of individuals with non-trivial relationships, in 



Fig. |4(b)| we show the joint probability density of inter- 
tweets times and inter-tweets spaces before (left panel), 
during (middle panel) and after (right panel) the main 
event. Before the main event, consecutive tweets are 
mainly sent at local scales within less than half a minute 
interval, generally within 8 seconds: consecutive tweets 
are more likely to be sent by users living within 20 km, 
although a significant number of tweets is still posted on 
larger inter-tweets spatial scales. The system dynamics 
changes dramatically during the main event: tweets from 
any part of the world are likely to be sent within 2 sec- 
onds without a specific spatial pattern. User activity now 
is frenetic and information is quickly spreading at any 
spatial scale. After the main event, the spatio-temporal 
dynamics tends to become similar to the activity before 
the main event, even if, in this case, users from any part 
of the world are still involved in the process, with no 
apparent prevalence of small or large inter-tweets spaces. 

Microscopic Level. We now analyze the dynamics 
at a microscopic level (i.e., treating individuals sepa- 
rately) by inspecting inter-arrival times of activities as 
tweeting, replying and re-tweeting. In the following, 
the inter-activity time for user u is defined by r u (i) — 
t u (i + 1) — t u (i), where t u (i) and t u (i + 1) indicate the 
times when user u sent the i— th and the i + 1-th tweets, 
respectively. 



In Fig. 5(a) we show the distribution of inter-tweets 



times r (i.e., between consecutive tweets) during (right 
panel), before and after the main event (left panel). In- 
triguingly, before and after the main event, such distribu- 
tions show power- law scaling of type P{t) oc t~ q , with 
a w 1, over three decades of inter- tweets times, from the 
scale of one minute to the scale of one day. 

Timing of user activities is usually modeled using a 
Poisson distribution. However, there is evidence that 
inter-tweets times between subsequent user actions fol- 
low a non-Poisson statistics, characterized by bursts of 
rapidly occurring events separated by long periods of in- 
activity . The bursty nature of user behavior has 
been recently attributed [16] to decision-based queuing 
processes |17) . where individuals tend to act in response 
to some perceived priority. According to this model, the 
timing of tasks to be executed is heavy-tailed, with rapid 
responses in the majority of cases and a few responses 
with very long waiting times [16] . Moreover, it has been 
shown that bursty user activity patterns might have a 
remarkable impact on the spreading dynamics over com- 
plex networks: this dynamics might be related to the 
waiting time distribution but it is not sensitive to the 
network topology |18j . A similar dynamics is observed 
in our data: the distribution of inter-tweets times shown 
in the left panel of Fig. 5(a)| reflects the bursty nature 
of user activities on online social networks, where indi- 
viduals are more likely to send several tweets in quick 
succession within a few minutes, followed by long peri- 
ods of no or reduced activity, up to one day. 

Inter-tweets times distribution during the main event 
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Figure 3: Top: heatmap for the density of tweets before (left panel), during (middle panel) and after (right panel) the main 
event on 4 th July 2012. Bottom: corresponding networks of re-tweets between users. During the announcement, the Twitter 
activity is truly global, whereas before and after the announcement, the most active countries were European and American, 
due to the large presence of scientists in these geographic areas. 




Figure 4: Global spatio-temporal activities of any user in the social network, (a) Number of entries for inter-tweets times (left 
panel) and inter-tweets spaces (right panel) between consecutive tweets, before, during and after the main event on 4 th July. 
The dashed line indicates a power law ~ t~ 2 and is for guidance only, (b) Joint probability density of inter-tweets times and 
inter-tweets spaces between consecutive tweets before (left panel), during (middle panel) and after (right panel) the main event. 
In both cases, only the sub-set of geo-located tweets is considered. 



shows a very different behavior, more compatible with 
a log-normal law instead of a power-law scaling relation- 
ship. In this case, if r is the random variable representing 
the time to the next tweet, the random variable logr is 



normally distributed with mean fi and standard deviation 
a. If a user starts to tweet because he or she is triggered 
by tweets of users in his or her social neighborhood, the 
total number of tweets can pass a threshold value above 
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Figure 5: Activity inter-tweets times of users in the social network, (a) Number of entries for inter-tweets times between 
consecutive tweets, before and after the main event on 4 th July (left panel), and during the main event (right panel). In the 
left panel, power scaling behavior is visible for certain ranges of values, whereas in the right one, we observe a lognormal fit. 
Dashed lines in the left panel are for guidance only, while the dashed line in the right panel indicates the curve corresponding 
to our fit. (b) Number of entries for inter-tweets times between replies (left panel) and re-tweets (right panel) during the whole 
data collection period. 



which a cascading effect may occur in the network. If this 
is the case, we assist to a random multiplicative spreading 
of information, whose inter-tweets times are distributed 
following a log-normal law if the number of vertices in- 
volved in the cascade is large enough. The log-normal 
law with (i = 5.627 ± 0.008 and a = 1.742 ± 0.006 de- 
scribes the observed activities during the main event with 
remarkable accuracy. 



In Fig. |5(b)| we show the distributions of inter-arrival 
times for replies (left panel) and re-tweets (right panel) 
during the entire data collection period. Intriguingly, 
user activities are still characterized by non-Poissonian 
behavior. Inter-arrival times for replies follow a power- 
law P(t) oc t from a few minutes up to one day: such 
a scaling can be explained with the existence of bursty 
behavior in the timing of user actions, previously dis- 
cussed in the case of inter-tweets times before and after 
the main event. It is worth noting that the scaling ex- 
ponent is larger for intervals between the original tweets 
and replies than for re-tweets. For temporal scales larger 
than one day, the power-law scaling is not present; an 
exponential cut-off does not model the observed decay. 

The case of re-tweets deserves particular attention. 
From the time scale of a few minutes up to the time scale 
of a few hours, we observe the power-law scaling relation- 
ship -P(t) oc t~ 8 , with a cut-off on the time scale of one 
day. We model the data with a power-law with an ex- 
ponential cut-off P(t) oc t~ - 8 exp(— t/tq), with cut-off 
scale To ~ 11 hours. It is worth remarking that power- 
law scaling relationships with exponent a < 1 cannot be 
normalized and do not occur in nature unless the scaling 
deviates from power law after some threshold value, the 
cut-off scale, above which the distribution rapidly falls to 
zero. Even in such cases, phenomena exhibiting scaling 
exponents smaller than unity are very rare [19] . 



Gossip Spreading 

In this section, we investigate the dynamics of informa- 
tion spreading in the social network of users who twitted 
about the Higgs boson. Despite the fact that informa- 
tion spreading shares some general dynamical features 
with the spreading of diseases, their nature is deeply dif- 
ferent. For instance, disease epidemics depends on the 
physical contacts between individuals and the different 
biological characteristics of both the infectious agent and 
the carrier, as well as many other factors [2U], whereas 
information can also be spread through non-physical con- 
tacts making use of communication infrastructures such 
as telephone, television and Internet [21] . Information 
is very volatile and it is not subject to incubation pe- 
riods: it is only worth spreading or not and this deci- 
sion is made by individuals, unlike the case of disease 
spreading. In the last decade, the study of contagion 
dynamics, involving either information or disease trans- 
mission, has greatly benefited from key results in complex 
networks modeling [2"2Tl29| : in fact, the structure of so- 
cial relationships plays a fundamental role for any type 
of spreading dynamics |3Tjr[3"2"] . If the underlying topol- 
ogy of the network is homogenous, the dynamics can be 
studied by adopting a mean-field approximation and the 
spreading occurs only if the rate of transmission of in- 
formation exceeds an epidemic threshold. Conversely, 
heterogeneous structures like scale-free networks require 
heterogeneous mean- field approximation [22 , 23 , involv- 
ing the single-site equation governing the time evolution 
of the relative density of "infected" vertices with given 
connectivity k, i.e., the probability that a vertex with de- 
gree k is infected. Moreover, such networks have the pe- 
culiar property of facilitating the spreading of infections: 
in fact, if the corresponding degree distribution shows di- 
verging second moment, then the epidemic threshold is 
zero independently from the degree correlations [33] . Al- 
though mean-field approximations are fundamental tools 
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Figure 6: Visualisation of the social network of active users, 
based on fc-core decomposition and components analysis. The 
size of each vertex is proportional to its degree, whereas color 
codes the fc-coreness. A sample of 10% of the whole network 
has been used for this visualisation. 



to capture the main features of the spreading dynamics, 
particularly in the early stage, the models are less effi- 
cient when the finite size of the population becomes a 
significant factor. More recent approaches focus on the 
probability of transmission of individual vertices |34j and 
non-perturbative formulation of the heterogeneous mean- 
field approach |35| . 

In our analysis, we will distinguish between two differ- 
ent states for users in the social network: "active" and 
"non-active" vertices. We will indicate with "tweeting 
activation" or "gossip spreading" the user-to-user inter- 
action process for transmitting information related to a 
particular topic. In the following, we will indicate with 
A(t) and D(t) the number of active and non-active users 
at time t, respectively, with A{t) + D(t) = N, where N 
is total number of users considered in the social network. 
The observed social network of active users is shown in 
Fig. [6] where a visualization based on fc-core decomposi- 
tion and component analysis is presented [55J |3_7] . The 
fc-core of a graph is defined as the maximal connected 
subgraph in which all vertices have degree at least fc. In 
practice, a fc-core is obtained by recursively removing all 
vertices with degree smaller than fc, until the degree of 
all remaining vertices is larger than or equal to fc. The fc- 
coreness of a vertex is the index of the highest fc-core con- 
taining that vertex. Vertices with the highest fc-coreness 
act as the most influential spreader of information in the 
network. In fact, it has been recently shown that in some 
plausible circumstances the best spreaders are not the 
most highly connected or the most central people but 
those with higher fc-coreness [38], and there is evidence 
of a positive correlation between fc-coreness and the size 
of cascades of messages, suggesting that users at the core 
of the network are more likely to be the seeds of global 
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Figure 7: Points indicate the fraction of users who are active 
at least once (see the text for more detail) with respect to 
the total number of users in the dataset at the end of the 
period taken into consideration, i.e., A*(t — 8 July 2012), as 
a function of time. Lines indicate the fitting results obtained 
separately for each temporal range by adopting the model 
given by Eq. pj. The rate of activation A* for each period is 
reported at the bottom of the figure. 



chains of information diffusion [39] . 

The fc-core decomposition allows to identify some 
salient features of the observed social network of active 
users, uncovering structural properties due to its specific 
topology. In Fig. [6] the presence of an inhomogeneous 
distribution of vertices in the shells is a signature of non- 
trivial correlations. Moreover, the presence of vertices 
with high degree in any fc-shell, i.e., a very low correla- 
tion between degree and shell-index, indicates that hubs 
are likely to be found also in external shells, a behavior 
typical of networks without an apparent global hierarchi- 
cal structure like the World-Wide Web [551157] . 

Modeling the Dynamics of User Activation with- 
out De-activation. As the first step, we do not consider 
the influence of the structure of the network on the pro- 
cess. We define a node as active at time t if he or she has 
tweeted at least once about the Higgs boson within that 
instant of time. In the following, we indicate with A*(t) 
and a*(t) the number and the fraction of active users at 
time t, respectively. 

Hence, the number A* (t) of active users is expected to 
be a monotonic increasing function of time. We divide 
the whole period of data taking into four temporal ranges 
of interest, corresponding to periods I, II, III and IV, 
previously described. In Fig. [7] we show for each period 
the observed evolution of the fraction a*(t) — A*(t)/N of 
infected users versus time, where N is the total number 
of users in the dataset in the data collection period. 

In order to model the evolution of the active users 
over time, we firstly tried to exploit classic susceptible- 
infected (SI) models in an unstructured population [4TJ] . 
but this led to a very poor fit of the data. For this reason, 
we developed a new model starting from the observation 
of the specific characteristics of our dataset. In general, 
once a user has twitted, we observe that he or she will 
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not probably tweet significantly about the Higgs boson in 
the near future, according to the bursty behavior shown 
in Sec.[] Therefore, we make the simplifying assumption 
that he or she will not tweet again after the tweeting ac- 
tivation. In this case, the number of newly active vertices 
at time t is proportional to the number of users who have 
not been active before: 



A*(t + At) = A*(t) + X* [N - A*(t)] At. 



(1) 



where A* is a constant activation rate. In the limit of 
small At, we obtain the following ordinary differential 
equation: 



da*(t) 
dt 



A* [l-o*(t)], 



(2) 



corresponding to our model for the fraction of users 
tweeting at least once about the Higgs boson. The evo- 
lution over time of a* it) is the solution of Eq. given 
by 



a*{t) = !-[!- a* (t k )}e 



-x*(t-t h ) 



(3) 



where A; =1, II, III and IV indicates the period of in- 
terest, tk is the starting date of period k and a*(tk) is 
the corresponding initial fraction of users active at least 
once. 

We fit the evolution function given by Eq. ^ to the 
observed data for each period of interest: the resulting 
model for each case is shown in Fig. ([7]), demonstrating 
the agreement with the data. The activation rate in- 
creases during the four intervals of time taken into con- 
sideration, from about one user per minute on 1 st July 
2012, up to about 519 users per minute in the last period. 

Modeling the Dynamics of User Activation with 
De-activation. In this subsection, we will focus on the 
propagation of interest on the event through social cas- 
cading. A user is considered non-active in a given time 
window At if he or she has not tweeted in that time in- 
terval. In other words, in this refined model, an active 
user can become non-active again (de-activated) if he or 
she does not keep tweeting about the Higgs boson. In 
the time interval between t and t + At active users can 
become non-active after a certain amount of time for any 
reason: we indicate with f3(t) the probability per unit of 
time for the transition from active to non-active state. 
Hence, the number of users that becomes non-active in 
the interval At is given by (3(t)A(t)At. By introducing 
de-activation we also account for the limited visibility 
of tweets on the timelines of Twitter clients, i.e., newer 
tweets replace older ones. Moreover, we observe that the 
number of non-active users at time t that will become 
active at time t + At is a function of both their in-going 
degree and the out-going degree of active users at time t. 

A non-active user connected to more than one active 
user at the same time is more likely to become active 



with respect to non-active users connected to only one 
active user. Let us indicate with ja the number of ac- 
tive users connected to a non-active user. If X(t) indi- 
cates the activation probability per unit of time per link, 
for a non-active user with degree k m the probability per 
unit of time of changing from non-active to active state 
is given by p\(t;jA) = 1 — [1 — X(t)] JA . In general, the 
probability that such a non-active user is connected to ja 
active users at the same time depends on the out-going 
degree of active users, i.e., on network vertex- vertex cor- 
relations. More specifically, such a probability depends 
on the conditional probability of observing a vertex with 
out-going degree k out connected to a vertex with in-going 
degree k m . 

It has been shown that a pure scale-free degree dis- 
tribution with exponent between 2 and 3 is a sufficient 
condition for the absence of an epidemic threshold in 
unstructured networks with arbitrary two-point degree 
correlation function |33j . i.e., correlations at neighbor- 
hood level do not affect the spreading dynamics. We use 
this result as a simplifying assumption for modeling the 
spreading in our network, exhibiting a scale-free degree 
distribution with exponent 2.5 for k > 200. Therefore, 
we neglect correlations and we estimate the probability 
that a non-active user, with in-going degree fc m , is con- 
nected to ja active users, with any out-going degree, by 



P(t;j A ,k in ) 



N-A(t)-l\ 
k in ~jA I 



(N- 
\ k' 



(4) 



accounting for all the possible ways to arrange A(t) ac- 
tivations within 2a users from the total number of pos- 
sible combinations of the remaining N — 1 users within 
k m users. 

Hence, the probability that a non-active user with in- 
going degree k m is activated by at least one active user 
in its neighborhood is given by 



A'"' 



P: 



A)= ^p(i; M ,fc l ") PA (t;j A ) 

3A = 1 



(5) 



It follows that the total probability that non-active users 
will become active per unit of time is given by 



q a (*) = E nv n )Px,»- ( D — > A ), 



(6) 



being V(k m ) the probability density of the in-going de- 
gree. It follows that (N-A(t))V(k ln ) indicates the num- 
ber of non-active users with in-going degree k m at time 
t. We model the dynamics of the number of active users 
in the time interval At by 

A(t + At) = A(t) + [-/3(t)A(t) + (N- A(t))e m (t)] At. 

Therefore, by choosing At = 1, i.e., equal to the time 
unit of observation, we obtain the general discrete model 

A(t + 1) = (l - Pit)) A(t) + (N - A(t))e~ m (t), (7) 
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valid for the general case of activation and de-activation 
rates that change over time. 

In Eq. ([7]) the parameters ft = /3At and A = XAt indi- 
cate probability instead of probability rates. However, in 
the particular case of At — 1 it is possible to mix rates 
and probabilities because both will have the same values, 
even though their units are different [35] : for sake of sim- 
plicity, in the following we use the notation )3 = j3 and 
A = A. Eq. ([7]) represents the balance equation indicat- 
ing that the number of active users at a certain instant 
is given by the number of vertices that at the previous 
instant did not change from active to non-active state 
plus the number of newly active users. In the following 
we will consider the density of active users defined by 
p(t) = A(t) /N, leading to the evolution equation 

P (t + 1) = (1 - m) p(t) + (i - P (t))e x(t) (t). (8) 

In general, the solution of Eq. ^ and Eq. Q cannot 
be obtained analytically because of the complexity of 
Qx(t)(t) : therefore, some simplifying assumptions or nu- 
merical methods should be adopted instead. 

Let us focus only on Period IV, i.e., during and after 
the main event, from 03:00 AM on 4 th July to the end of 
the data collection period. The initial fraction of active 
users is approximately p(0) = 0.1% of the total number 
of users in our dataset. 

In order to assess the validity of our analytical model, 
we perform large-scale Monte Carlo simulations of the 
spreading dynamics through the network of observed con- 
nections among users. More specifically, we consider 
the case where activation and de-activation rates do not 
change over time: we vary their values from to 1, inde- 
pendently; for each possible configuration corresponding 
to the pair A) we perform 200 random independent 
realizations of gossip spreading and we calculate the en- 
semble average at each time step t to obtain an estimation 
of the expected value of the density p{t) . The results for 
the case with (3 = 1 and several different values of A are 
shown in Fig. |8(a) In the left panel, we show the evolu- 
tion of p(t) versus time: for A < 6 x 10~ 3 the density p(t) 
tends to decrease to zero for increasing time, while for 
A > 6 x 10~ 3 the density p(t) tends to reach a stationary 
state, indicating that the spreading becomes endemic. 
We indicate with p* the stationary value reached by p(t) 



after the transient time. In the right panel of Fig. 8(a) 
we show the value of p* versus the activation rate: the 
endemic state, where p* > 0, is quickly reached for small 
values of A. This result is qualitatively confirmed by 
our analytical model (see Eq. (J8|) and it is in agreement 
with the result reported in |33j , stating that the epidemic 
threshold of an endemic state tends to zero for increasing 
network size with a scale-free topology. However, such re- 
sults do not reproduce the observe d spre ading dynamics, 
whose density p(t) is shown in Fig. |8(b)| The data show 
a quickly increasing number of users within a few hours, 
with a maximum value reached at the beginning of the 
International Conference on High-Energy Physics. Such 
a fast increasing behavior can be explained by tweets re- 



lated to the excitement for a possible announcement of 
the discovery of the Higgs boson. In fact, the number 
of active users in the following hour rapidly decreases by 
about 40%, staying stable for the subsequent 2 hours and 
then decreasing again. 

In case of epidemics with constant activation rate in 
scale-free networks with a large number of nodes we ex- 
pect the appearance of an endemic state. However, this 
is not the case in our dataset. For this reason, we mod- 
ify the model by introducing a variable activation rate 
A, accounting for the decreasing interest on a tweet over 
time. We model the evolution of A as follows: 



A(t + 1) = (l-£)A(i), 0<£<1, 



(9) 



which is the discrete counterpart of the continuous equa- 
tion whose solution is the exponential decay X(t) = 
\(to)e~t( t ~ t °\ Here, we interpret £ as the inverse of a 
characteristic scale r regulating the decay dynamics. We 
use the coupled equations ^ and ^ to model the ob- 
served spreading dynamics. 

During the whole Period IV, we identify five sub- 
periods, each one characterized by an increasing number 
of active users followed by a decreasing one. We then vary 
the parameters /3, A and r in order to try to reproduce 
the data in each sub-period. The solid curves in Fig. |8(b)| 
correspond to our model (equations (Jsj) and Q) using the 
set of parameters minimizing . The rapid increase of 
active users in the first sub-period of Period IV is fol- 
lowed by a fast decrease, with time scale r « 1.13 hours, 
initial activation rate A = 1 and (3 = 0.17. Such a 
fast decreasing trend is slowed after about 9 hours, ap- 
proximately at the time when the gossip has reached the 
other side of the world in the early morning: from this 
time instant up to the end of the observation, the val- 
ues of de-activation probability and the initial value of 
the activation probability are almost constant (ranging 
from 0.31 to 0.38, and from 0.40 to 0.45, respectively). 
In the following sub-periods only the decay time scale r 
significantly varies from 3 to 17 hours. 



Discussion 

On 4 th July 2012, the ATLAS and CMS collabora- 
tions announced the discovery of a new particle, with the 
same features of the elusive Higgs boson. Such a finding 
represents a milestone in particle physics and a unique 
occasion to study the dynamics of information spread- 
ing on a global scale. In this study, we have monitored 
user activities on Twitter before, during and after the 
announcement of the discovery of this Higgs boson-like 
particle. 

The joint analysis of spatial and temporal user activity 
patterns unveiled specific dynamics in different periods 
of the gossip spreading. Before the announcement of the 
teams based at CERN, tweets were more likely to be sent 
within a few seconds by users living within 20 km. Dur- 
ing the main event the activity became frenetic and its 
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Figure 8: (a) Left panel: Evolution of the density of active users versus time obtained from simulations of spreading dynamics. 
The de-activation rate is /? = 1 and the different curves correspond to different values of the activation rate A. Each curve 
corresponds to the ensemble average of 200 random independent realizations. Right panel: Average value of the density of 
active users in the stationary state, as a function of A. (b) Observed evolution of the density of active users versus time (points) 
in Period IV, i.e., during and after the main event, from 03:00 AM, 4 th July. Curves indicate the predictions obtained from 
the model defined by Eq.[8] coupled to Eq.[9] where the values of the corresponding parameters are reported in the figure for 
different sub-periods. The reported A refers to the initial value of the activation rate. 



time scale reduced to 2 seconds without a specific spatial 
pattern. After the main event a less frenetic activity has 
been observed while users from any part of the world were 
still involved in the process, with no apparent prevalence 
of small or large inter-tweets spaces. 

Finally, we have focused our attention on the network 
of individuals who posted at least one message about the 
discovery (tweets, re- tweets or reply to a tweet). The ob- 
served network of users exhibits a non-trivial structure 
with a typically bursty tweeting process happening over 
it. We have proposed a model for information spreading 
with variable activation rate in heterogenous networks 
showing that we are able to reproduce the collective be- 
havior of about 500,000 users with remarkable accuracy. 

Even if the proposed models have been developed for 
this specific series of events, we believe that the pro- 
posed framework can be applied and fitted to many other 
spreading processes on online and physical social net- 
works. 



Methods 

For each search, Twitter provides information about 
the number of missing tweets, which is usually negligi- 



ble. In any case, it worth noting that during this data 
collection no messages related to missing tweets were re- 
ceived: for this reason, to the best of our knowledge, 
we might claim that this dataset includes all the tweets 
satisfying our search criteria. 

Our original list of relevant hashtags was larger, in- 
cluding terms as alice and cms. However, the amount 
of tweets retrieved using these hashtags but not related 
to the Higgs boson was not negligible and, for this rea- 
son, we decided to avoid considering such terms in our 
analysis. 

The geographic names contained in the location 
textfield of Twitter user profiles were converted using the 
Google Geocoder API. 
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